# A Comprehensive Survey on Graph Neural Networks

神经网络最近的成功推动了模式识别和数据挖掘的研究。许多机器学习任务，例如对象检测[1]，[2]，机器翻译[3]，[4]和语音识别[5]，这些任务曾经高度依赖于手工特征工程来提取信息特征集，最近已经卷积神经网络（CNN）[6]，递归神经网络（RNN）[7]和自动编码器[8]等各种端到端深度学习范例进行了革新。深度学习在许多领域的成功部分归因于快速发展的计算资源（例如GPU），大量训练数据的可用性以及深度学习从欧几里得数据（例如图像，文本，和视频）。以图像数据为例，我们可以将图像表示为欧几里得空间中的规则网格。卷积神经网络（CNN）能够利用图像数据的平移不变性，局部连通性和合成性[9]。结果，CNN可以提取与整个数据集共享的局部有意义的特征，以进行各种图像分析。

THE recent success of neural networks has boosted research on pattern recognition and data mining. Many  machine learning tasks such as object detection [1], [2],  machine translation [3], [4], and speech recognition [5], which  once heavily relied on handcrafted feature engineering to  extract informative feature sets, has recently been revolutionized by various end-to-end deep learning paradigms, e.g.,  convolutional neural networks (CNNs) [6], recurrent neural  networks (RNNs) [7], and autoencoders [8]. The success of  deep learning in many domains is partially attributed to the  rapidly developing computational resources (e.g., GPU), the  availability of big training data, and the effectiveness of deep  learning to extract latent representations from Euclidean data  (e.g., images, text, and videos). Taking image data as an example, we can represent an image as a regular grid in  the Euclidean space. A convolutional neural network (CNN)  is able to exploit the shift-invariance, local connectivity, and  compositionality of image data [9]. As a result, CNNs can  extract local meaningful features that are shared with the entire  data sets for various image analysis.

虽然深度学习有效地捕获了欧几里得数据的隐藏模式，但越来越多的应用程序以图形形式表示数据。例如，在电子商务中，基于图的学习系统可以利用用户和产品之间的交互来提出高度准确的建议。在化学中，分子被建模为图形，并且需要确定其生物活性以进行药物发现。在引文网络中，论文通过引文相互链接，因此需要将它们分为不同的组。图形数据的复杂性对现有的机器学习算法提出了重大挑战。由于图可能是不规则的，因此图可能具有可变大小的无序节点，并且图中的节点可能具有不同数量的邻居，从而导致一些重要的操作（例如卷积）在图像域中易于计算，但是难以应用于图域。此外，现有机器学习算法的核心假设是实例彼此独立。这种假设不再适用于图形数据，因为每个实例（节点）通过各种类型的链接（例如引文，友谊和交互）与其他实例（节点）相关联。

While deep learning effectively captures hidden patterns of  Euclidean data, there is an increasing number of applications  where data are represented in the form of graphs. For examples, in e-commence, a graph-based learning system can  exploit the interactions between users and products to make  highly accurate recommendations. In chemistry, molecules  are modeled as graphs, and their bioactivity needs to be  identified for drug discovery. In a citation network, papers  are linked to each other via citationships and they need to  be categorized into different groups. The complexity of graph  data has imposed significant challenges on existing machine  learning algorithms. As graphs can be irregular, a graph may  have a variable size of unordered nodes, and nodes from a  graph may have a different number of neighbors, resulting  in some important operations (e.g., convolutions) being easy  to compute in the image domain, but difficult to apply to  the graph domain. Furthermore, a core assumption of existing  machine learning algorithms is that instances are independent  of each other. This assumption no longer holds for graph data  because each instance (node) is related to others by links of  various types, such as citations, friendships, and interactions.

最近，人们对扩展用于图数据的深度学习方法越来越感兴趣。 在过去的几年中，受深度学习的CNN，RNN和自动编码器的激励，重要操作的新概括和定义已得到快速发展，以处理图形数据的复杂性。 例如，图卷积可以从2D卷积中概括出来。 如图1所示，可以将图像视为像素相邻像素连接的图形的特殊情况。 与2D卷积类似，可以通过获取节点邻域信息的加权平均值来执行图卷积

Recently, there is increasing interest in extending deep  learning approaches for graph data. Motivated by CNNs,  RNNs, and autoencoders from deep learning, new generalizations and definitions of important operations have been  rapidly developed over the past few years to handle the complexity of graph data. For example, a graph convolution can  be generalized from a 2D convolution. As illustrated in Figure  1, an image can be considered as a special case of graphs  where pixels are connected by adjacent pixels. Similar to 2D  convolution, one may perform graph convolutions by taking  the weighted average of a node’s neighborhood information

关于图神经网络（GNN）的现有评论数量有限。 Bronstein等人使用几何深度学习一词。 [9]概述了非欧式领域的深度学习方法，包括图形和流形。 尽管这是有关GNN的第一篇评论，但本次调查主要评论卷积GNN。 **汉密尔顿等 [10]涵盖了有限数量的GNN，重点是解决网络嵌入问题。** Battaglia等[11]位置图网络是从关系数据中学习的基础，在统一框架下回顾了部分GNN。Lee等 [12]对采用不同注意机制的GNN进行了部分调查。 总而言之，现有的调查仅包括一些GNN，并检查了数量有限的作品，从而错过了GNN的最新发展。 我们的调查为希望进入这一快速发展领域的感兴趣的研究人员和想比较GNN模型的专家提供了GNN的全面概述。 为了涵盖更广泛的方法，本次调查将GNN视为图数据的所有深度学习方法。

There are a limited number of existing reviews on the topic  of graph neural networks (GNNs). Using the term geometric  deep learning, Bronstein et al. [9] give an overview of deep  learning methods in the non-Euclidean domain, including  graphs and manifolds. Although it is the first review on GNNs,  this survey mainly reviews convolutional GNNs. Hamilton  et al. [10] cover a limited number of GNNs with a focus  on addressing the problem of network embedding. Battaglia  et al. [11] position graph networks as the building blocks for learning from relational data, reviewing part of GNNs  under a unified framework. Lee et al. [12] conduct a partial  survey of GNNs which apply different attention mechanisms.  In summary, existing surveys only include some of the GNNs  and examine a limited number of works, thereby missing  the most recent development of GNNs. Our survey provides  a comprehensive overview of GNNs, for both interested researchers who want to enter this rapidly developing field and  experts who would like to compare GNN models. To cover a  broader range of methods, this survey considers GNNs as all  deep learning approaches for graph data.

调查的组织结构本调查的其余部分安排如下。 第二部分概述了图神经网络的背景，列出了常用的符号，并定义了与图相关的概念。 第三节阐明了图神经网络的分类。 第IV-VII节概述了图神经网络模型。 第八节介绍了各个领域的应用程序集合。 第九节讨论了当前的挑战并提出了未来的方向。 第十部分总结论文

Organization of our survey The rest of this survey is  organized as follows. Section II outlines the background of  graph neural networks, lists commonly used notations, and  defines graph-related concepts. Section III clarifies the categorization of graph neural networks. Section IV-VII provides  an overview of graph neural network models. Section VIII  presents a collection of applications across various domains.  Section IX discusses the current challenges and suggests future  directions. Section X summarizes the paper

## 三 categorization and frameworks

在本节中，我们介绍了图神经网络（GNN）的分类法，如表II所示。 我们将图神经网络（GNN）分为循环图神经网络（RecGNN），卷积图神经网络（ConvGNN），图自动编码器（GAE）和时空图神经网络（STGNN）。 图2给出了各种模型架构的示例。 下面，我们对每个类别进行简要介绍

In this section, we present our taxonomy of graph neural  networks (GNNs), as shown in Table II. We categorize graph  neural networks (GNNs) into recurrent graph neural networks (RecGNNs), convolutional graph neural networks (ConvGNNs), graph autoencoders (GAEs), and spatial-temporal  graph neural networks (STGNNs). Figure 2 gives examples  of various model architectures. In the following, we give a  brief introduction of each category

### 图神经网络（GNN）的分类法Taxonomy of Graph Neural Networks (GNNs)

递归图神经网络（RecGNN）大多是图神经网络的开创性作品。 RecGNN旨在学习具有递归神经体系结构的节点表示。 他们假设图中的一个节点不断交换所到达的信息。 RecGNN在概念上很重要，并在后来的卷积图神经网络研究中受到启发。 特别地，消息传递的思想被基于空间的卷积图神经网络所继承。

Recurrent graph neural networks (RecGNNs) mostly are  pioneer works of graph neural networks. RecGNNs aim to  learn node representations with recurrent neural architectures.  They assume a node in a graph constantly exchanges information reached. RecGNNs are conceptually important and inspired later research on convolutional graph neural networks. In particular, the idea of message passing is inherited by spatialbased convolutional graph neural networks.

卷积图神经网络（ConvGNN）概括了从网格数据到图数据的卷积操作。主要思想是通过汇总节点自身的特征xv和邻居的特征xu来生成节点v的表示形式，其中u∈N（v）。 与RecGNN不同，ConvGNN堆叠多个图卷积层以提取高级节点表示。 ConvGNN在建立许多其他复杂的GNN模型中起着核心作用。 图2a显示了用于节点分类的ConvGNN。 图2b演示了用于图分类的ConvGNN。

Convolutional graph neural networks (ConvGNNs) generalize the operation of convolution from grid data to graph  data. The main idea is to generate a node v’s representation by  aggregating its own features xv and neighbors’ features xu,  where u ∈ N(v). Different from RecGNNs, ConvGNNs stack  multiple graph convolutional layers to extract high-level node  representations. ConvGNNs play a central role in building  up many other complex GNN models. Figure 2a shows a  ConvGNN for node classification. Figure 2b demonstrates a  ConvGNN for graph classification.

图自动编码器（GAE）是无监督的学习框架，可将节点或图编码到潜在的矢量空间中，并从编码后的信息中重建图数据。 GAE用于学习网络嵌入和图形生成分布。 对于网络嵌入，GAE通过重建图结构信息（例如图邻接矩阵）来学习潜在节点表示。 对于图生成，某些方法逐步生成图的节点和边，而其他方法则一次全部输出图。 图2c展示了用于网络嵌入的GAE。

Graph autoencoders (GAEs) are unsupervised learning  frameworks which encode nodes or graphs into a latent vector  space and reconstruct graph data from the encoded information. GAEs are used to learn network embeddings and  graph generative distributions. For network embedding, GAEs  learn latent node representations through reconstructing graph  structural information such as the graph adjacency matrix. For  graph generation, some methods generate nodes and edges of  a graph step by step while other methods output a graph all  at once. Figure 2c presents a GAE for network embedding.

时空图神经网络（STGNN）旨在从时空图学习隐藏模式，这种模式在各种应用中变得越来越重要，例如交通速度预测[72]，驾驶员操纵预期[73]和人类动作识别[ 75]。 STGNN的关键思想是同时考虑空间依赖性和时间依赖性。 许多当前的方法将图卷积与RNN或CNN集成在一起以捕获空间依存关系，从而对时间依存关系进行建模。 图2d说明了用于时空图预测的STGNN。

Spatial-temporal graph neural networks (STGNNs) aim  to learn hidden patterns from spatial-temporal graphs, which  become increasingly important in a variety of applications such  as traffic speed forecasting [72], driver maneuver anticipation  [73], and human action recognition [75]. The key idea of  STGNNs is to consider spatial dependency and temporal  dependency at the same time. Many current approaches integrate graph convolutions to capture spatial dependency with  RNNs or CNNs to model the temporal dependency. Figure 2d  illustrates a STGNN for spatial-temporal graph forecasting.

### Framework of GNN

使用图结构和节点内容信息作为输入，GNN的输出可以通过以下机制之一专注于不同的图分析任务：

With the graph structure and node content information as  inputs, the outputs of GNNs can focus on different graph  analytics tasks with one of the following mechanisms:

节点级别的输出与**节点回归和节点分类任务**有关。 RecGNN和ConvGNN可以通过信息传播来提取高级节点表示形式

Node-level outputs relate to node regression and node  classification tasks. RecGNNs and ConvGNNs can extract  high-level node representations by information propagation

边缘级输出与边缘分类和**链接预测任务**有关。 将来自GNN的两个节点的隐藏表示作为输入，可以利用相似度函数或神经网络来预测标签

Edge-level outputs relate to the edge classification and  link prediction tasks. With two nodes’ hidden representations from GNNs as inputs, a similarity function or a neural network can be utilized to predict the label

图级别的输出与图分类任务有关。 为了在图级别上获得紧凑的表示形式，GNN通常与合并和读取操作结合使用。 有关合并和读出的详细信息将在V-C节中进行审查。

Graph-level outputs relate to the graph classification  task. To obtain a compact representation on the graph  level, GNNs are often combined with pooling and readout operations. Detailed information about pooling and  readouts will be reviewed in Section V-C.

培训框架。 可以在端到端学习框架内以（半）监督或完全无监督的方式训练许多GNN（例如ConvGNN），具体取决于学习任务和手头可用的标签信息。

Training Frameworks. Many GNNs (e.g., ConvGNNs) can  be trained in a (semi-) supervised or purely unsupervised way  within an end-to-end learning framework, depending on the  learning tasks and label information available at hand.

用于节点级分类的半监督学习。 给定一个带有部分节点被标记而其他节点未被标记的单个网络，ConvGNN可以学习一个健壮的模型，该模型可以有效地识别未标记节点的类标签[22]。 为此，可以通过堆叠几个图卷积层，然后堆叠用于多类分类的softmax层，来构建端到端框架。

Semi-supervised learning for node-level classification.  Given a single network with partial nodes being labeled  and others remaining unlabeled, ConvGNNs can learn a  robust model that effectively identifies the class labels  for the unlabeled nodes [22]. To this end, an end-toend framework can be built by stacking a couple of  graph convolutional layers followed by a softmax layer  for multi-class classification.

图级分类的有监督学习。 图级分类旨在预测整个图的类标签[52]，[54]，[78]，[79]。可以通过图卷积层，图池化层的组合来实现此任务的端到端学习，并且可以通过图卷积层，图池化层和/或读出层的组合来实现此任务。图卷积层负责精确的高级节点表示，而图池化层则充当下采样的角色，从而每次将每个图都粗化为子结构。读出层将每个图的节点表示折叠为图表示。 通过将多层感知器和softmax层应用于图形表示，我们可以构建用于图形分类的端到端框架。 在图2b中给出了一个例子。

Supervised learning for graph-level classification.  Graph-level classification aims to predict the class label(s)  for an entire graph [52], [54], [78], [79]. The endto-end learning for this task can be realized with a  combination of graph convolutional layers, graph pooling  layers, and  this task can be realized with a combination of graph convolutional layers, graph pooling layers, and/or readout layers. While graph convolutional layers are responsible for exacting high-level node representations, graph pooling layers play the role of downsampling, which coarsens each graph into a sub-structure each time. A readout layer collapses node representations of each graph into a graph representation. By applying a multi-layer perceptron and a softmax layer to graph representations, we can build an end-to-end framework for graph classification. An example is given in Fig 2b.

图嵌入的无监督学习。 当图中没有可用的类标签时，我们可以在端到端框架中以纯粹无监督的方式学习图嵌入。 这些算法以两种方式利用边缘级信息。 一种简单的方法是采用自动编码器框架，其中编码器使用图卷积层将图嵌入到潜在表示中，在该潜在表示中使用解码器来重构图结构[61]，[62]。另一种流行的方式是利用负采样方法，该方法将一部分节点对采样为负对，而图中具有链接的现有节点对则为正对。 然后应用逻辑回归层来区分正对和负对[42]。

Unsupervised learning for graph embedding. When  no class labels are available in graphs, we can learn the  graph embedding in a purely unsupervised way in an end-to-end framework. These algorithms exploit edge-level  information in two ways. One simple way is to adopt  an auto-encoder framework where the encoder employs  graph convolutional layers to embed the graph into the  latent representation upon which a decoder is used to  reconstruct the graph structure [61], [62].Another pop-ular way is to utilize the negative sampling approach  which samples a portion of node pairs as negative pairs  while existing node pairs with links in the graphs are  positive pairs. Then a logistic regression layer is applied  to distinguish between positive and negative pairs [42].

在表III中，我们总结了代表性RecGNN和ConvGNN的主要特征。在各种模型之间比较了输入源，池化层，读出层和时间复杂度。更详细地说，我们仅比较每个模型中消息或图传递卷积运算的时间复杂度。由于[19]和[20]中的方法需要特征值分解，因此时间复杂度为O（n 3）。由于节点成对的最短路径计算，[46]的时间复杂度也为O（n 3）。其他方法产生等效的时间复杂度，如果图邻接矩阵稀疏，则为O（m），否则为O（n2）。这是因为在这些方法中，每个节点vi的表示的计算都涉及其di邻居，并且所有节点上di的总和正好等于边的数量。表III中缺少几种方法的时间复杂度。这些方法在其论文中缺乏时间复杂度分析，或者报告了其整体模型或算法的时间复杂度。

## 六 GAE

图自动编码器（GAE）是一种深度神经网络架构，可将节点映射到潜在特征空间并从潜在表示中解码图信息。 GAE可用于学习网络嵌入或生成新图。 表V总结了所选GAE的主要特征。在下文中，我们从网络嵌入和图形生成这两个角度简要概述了GAE。

Graph autoencoders (GAEs) are deep neural architectures  which map nodes into a latent feature space and decode graph  information from latent representations. GAEs can be used to  learn network embeddings or generate new graphs. The main  characteristics of selected GAEs are summarized in Table V.  In the following, we provide a brief review of GAEs from two  perspectives, network embedding and graph generation.

### network embedding

网络嵌入是节点的低维向量表示，可以保留节点的拓扑信息。 GAE使用编码器提取网络嵌入并使用解码器执行网络嵌入以保留图拓扑信息（例如PPMI矩阵和邻接矩阵）来学习网络嵌入。

A network embedding is a low-dimensional vector representation of a node which preserves a node’s topological  information. GAEs learn network embeddings using an encoder to extract network embeddings and using a decoder to  enforce network embeddings to preserve the graph topological  information such as the PPMI matrix and the adjacency matrix

较早的方法主要使用多层感知器来构建用于网络嵌入学习的GAE。 用于图形表示的深度神经网络（DNGR）[59]使用堆叠的去噪自动编码器[108]通过多层感知器对PPMI矩阵进行编码和解码。 同时，结构深度网络嵌入（SDNE）[60]使用堆叠式自动编码器共同保存节点的一阶接近度和二阶接近度。 SDNE在编码器的输出和解码器的输出上分别提出两个损失函数。 第一个损失函数使学习到的网络嵌入能够通过最小化节点的网络嵌入与其邻居的网络嵌入之间的距离来保持节点一阶邻近度。 第一损失函数L1st被定义为

Earlier approaches mainly employ multi-layer perceptrons  to build GAEs for network embedding learning. Deep Neural  Network for Graph Representations (DNGR) [59] uses a  stacked denoising autoencoder [108] to encode and decode  the PPMI matrix via multi-layer perceptrons. Concurrently,  Structural Deep Network Embedding (SDNE) [60] uses a  stacked autoencoder to preserve the node first-order proximity  and second-order proximity jointly. SDNE proposes two loss  functions on the outputs of the encoder and the outputs  of the decoder separately. The first loss function enables  the learned network embeddings to preserve the node firstorder proximity by minimizing the distance between a node’s  network embedding and its neighbors’ network embeddings.  The first loss function L1st is defined as

### graph generation

通过使用多个图，GAE可以通过将图编码为隐藏表示并解码给定隐藏表示的图结构来学习图的生成分布。 大多数用于图形生成的GAE都是为解决分子图形生成问题而设计的，该问题在药物发现中具有很高的实用价值。 这些方法要么以顺序方式要么以全局方式提出新图。

With multiple graphs, GAEs are able to learn the generative distribution of graphs by encoding graphs into hidden  representations and decoding a graph structure given hidden  representations. The majority of GAEs for graph generation  are designed to solve the molecular graph generation problem,  which has a high practical value in drug discovery. These  methods either propose a new graph in a sequential manner  or in a global manner.

顺序方法通过逐步提出节点和边来生成图。 戈麦斯（Gomez）等人。 [111]，Kusner等。 [112]，和戴等人。 [113]用深CNN和RNN分别作为编码器和解码器，对名为SMILES的分子图的字符串表示的生成过程进行建模。尽管这些方法是特定于领域的，但替代解决方案可通过将节点和边迭代地添加到增长的图上，直到满足特定条件为止，从而适用于一般图。 图的深度生成模型（DeepGMG）[65]假设图的概率是所有可能的节点排列之和：

Sequential approaches generate a graph by proposing nodes  and edges step by step. Gomez et al. [111], Kusner et al. [112],  and Dai et al. [113] model the generation process of a string  representation of molecular graphs named SMILES with deep  CNNs and RNNs as the encoder and the decoder respectively.While these methods are domain-specific, alternative solutions  are applicable to general graphs by means of iteratively adding  nodes and edges to a growing graph until a certain criterion is  satisfied. Deep Generative Model of Graphs (DeepGMG) [65]  assumes the probability of a graph is the sum over all possible  node permutations:

其中π表示节点顺序。 它捕获图中所有节点和边的复杂联合概率。 DeepGMG通过做出一系列决策来生成图形，即是否添加节点，要添加哪个节点，是否添加边以及连接到新节点的节点。 生成节点和边的决策过程以RecGNN更新的成长图的节点状态和图状态为条件。 在另一项工作中，GraphRNN [66]提出了图级RNN和边级RNN来建模节点和边的生成过程。 每次，图级RNN都会向节点序列中添加一个新节点，而边缘级RNN会生成一个二进制序列，该二进制序列指示新节点与该序列中先前生成的节点之间的连接。

where π denotes a node ordering. It captures the complex joint  probability of all nodes and edges in the graph. DeepGMG  generates graphs by making a sequence of decisions, namely  whether to add a node, which node to add, whether to add  an edge, and which node to connect to the new node. The  decision process of generating nodes and edges is conditioned  on the node states and the graph state of a growing graph  updated by a RecGNN. In another work, GraphRNN [66]  proposes a graph-level RNN and an edge-level RNN to model  the generation process of nodes and edges. The graph-level  RNN adds a new node to a node sequence each time while  the edge-level RNN produces a binary sequence indicating  connections between the new node and the nodes previously  generated in the sequence.

全局方法可一次输出全部图形。 图变分自动编码器（GraphVAE）[67]将节点和边的存在建模为独立随机变量。 通过假设由编码器定义的后验分布qφ（zG）和由解码器定义的生成分布pθ（G | z），GraphVAE优化了变分下界：

Global approaches output a graph all at once. Graph Variational Autoencoder (GraphVAE) [67] models the existence  of nodes and edges as independent random variables. By  assuming the posterior distribution qφ(zG) defined by an encoder and the generative distribution pθ(G|z) defined by a decoder, GraphVAE optimizes the variational lower bound:

其中p（z）遵循高斯先验，φ和θ是可学习的参数。使用ConvGNN作为编码器，并使用简单的多层感知作为解码器，GraphVAE输出生成的图形及其邻接矩阵，节点属性和边缘属性。控制生成图的全局属性（例如图连接性，有效性和节点兼容性）具有挑战性。正则化图变分自动编码器（RGVAE）[68]进一步在图变分自动编码器上施加了有效性约束，以正则化解码器的输出分布。分子生成对抗网络（MolGAN）[69]整合了convGNNs [114]，GANs [115]和强化学习目标，以生成具有所需特性的图形。 MolGAN由生成器和鉴别器组成，它们相互竞争以提高生成器的真实性。在MolGAN中，生成器尝试提出一个伪图及其特征矩阵，而鉴别器的目的是将伪样本与经验数据区分开。此外，根据区分器，引入了与鉴别器并行的奖励网络，以鼓励生成的图具有某些属性。 NetGAN [70]将LSTM [7]与Wasserstein GAN [116]结合起来，通过基于随机游走的方法生成图形。 NetGAN训练生成器通过LSTM网络生成合理的随机游走，并强制执行鉴别器以从真实的随机游走中识别出虚假的随机游走。训练后，通过归一化基于生成器产生的随机游走而计算出的节点的共现矩阵，可以得出新的图。